Scheduled Approximation and Incremental Enhancement for Accuracy-aware Personalized PageRank
نویسندگان
چکیده
As Personalized PageRank has been widely leveraged for ranking on a graph, the efficient computation of Personalized PageRank Vector (PPV) becomes a prominent issue. In this paper, we propose FastPPV, an approximate PPV computation algorithm that is incremental and accuracy-aware. Our approach hinges on a novel paradigm of scheduled approximation: the computation is partitioned and scheduled for processing in an “organized” way, such that we can gradually improve our PPV estimation in an incremental manner, and quantify the accuracy of our approximation at query time. Guided by this principle, we develop an efficient hub based realization, where we adopt the metric of hub-length to partition and schedule random walk tours so that the approximation error reduces exponentially over iterations. In addition, as tours are segmented by hubs, the shared substructures between different tours (around the same hub) can be reused to speed up query processing both within and across iterations. Given the key roles played by the hubs, we further investigate the This material is based upon work partially supported by NSF Grant IIS 1018723, the Advanced Digital Sciences Center of the University of Illinois at Urbana-Champaign, and the Agency for Science, Technology and Research of Singapore. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the funding agencies. F. Zhu · J. Ying Zhejiang University City College, Hangzhou, China, E-mail: {zhufw, yingj}@zucc.edu.cn Y. Fang Institute for Infocomm Research, Singapore E-mail: [email protected] K. C. Chang University of Illinois at Urbana-Champaign, USA Advanced Digital Sciences Center, Singapore E-mail: [email protected] problem of hub selection. In particular, we develop a conceptual model to select hubs based on the two desirable properties of hubs–sharing and discriminating, and present several different strategies to realize the conceptual model. Finally, we evaluate FastPPV over two real-world graphs, and show that it not only significantly outperforms two state-of-the-art baselines in both online and offline phrases, but also scale well on larger graphs. In particular, we are able to achieve nearconstant time online query processing irrespective of graph size.
منابع مشابه
Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation
As Personalized PageRank has been widely leveraged for ranking on a graph, the efficient computation of Personalized PageRank Vector (PPV) becomes a prominent issue. In this paper, we propose FastPPV, an approximate PPV computation algorithm that is incremental and accuracy-aware. Our approach hinges on a novel paradigm of scheduled approximation: the computation is partitioned and scheduled fo...
متن کاملApproximating Personalized PageRank with Minimal Use of Web Graph Data
In this paper, we consider the problem of calculating fast and accurate approximations to the personalized PageRank score ([8, 16]) of a webpage. We focus on techniques to improve speed by limiting the amount of webgraph data we need to access. PageRank scores are mainly used for ranking purposes, and generally only the scores exceeding a given threshold are relevant. In practice, and relative ...
متن کاملPersonalized PageRank Solution Paths
Personalized PageRank vectors used for many community detection and graph diffusion problems have a subtle dependence on a parameter epsilon that controls their accuracy. This parameter governs the sparsity of the solution and can be interpreted as a regularization parameter. We study algorithms to estimate the solution path as a function of the sparsity and propose two methods for this task. T...
متن کاملOolong: Programming Asynchronous Distributed Applications with Triggers
•Convergence: inherent detection of termination without separate check job Oolong targets: •Allows incremental recomputation •Asynchronous execution without global barriers •Examples: Crawling, incremental PageRank, SSSP Long trigger thread Retrigger Table Scheduled Fire Trigger Enqueue Retrigger dists = Table(int, double) nodes = Table(int, Node) initialize all dists <infinity enable SSSP_Trig...
متن کاملCommunity Detection Using Time-Dependent Personalized PageRank
Local graph diffusions have proven to be valuable tools for solving various graph clustering problems. As such, there has been much interest recently in efficient local algorithms for computing them. We present an efficient local algorithm for approximating a graph diffusion that generalizes both the celebrated personalized PageRank and its recent competitor/companion the heat kernel. Our algor...
متن کامل